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ABSTRACT 


The exponentially growing and tremendous collection of data stored in the 
power sector, combined with the need for data analysis, has produced an 
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explicitly show how raw data can be turned into insights, the study deploys 
Big data the use of the Hadoop on Hortonworks’ open-source apache-Hive licensed 
Electricity data warehousing framework run on a windows operating system to turn raw 
Energy datasets (in excel formats converted to .csv format) gotten from the prepaid 


Power consumption 


meters of 196,000 consumers (households and businesses) in 11 business 


units of Ikeja Electricity Distribution Company (IKEDC, Nigeria) to analyze 
the distribution and consumption of power. 
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1. INTRODUCTION 

About 2.5 quintillion bytes of data are produced on daily basis. Almost 90 percent of data in the 
world today have been created in the last two years alone. The produced data comes from various outlets: 
climate information, social media, from digital pictures and videos posted on the internet, cell phone GPS 
signal, and records from the purchased transactions [1]. Such colossal amount of data that is being produced 
continuously is what can be coined as big data. With the growth of technologies and services comes an 
increase in the amount of data. These types of data can be structured and unstructured from different sources 
that could contain billions of records of people information that can be gotten from social media, web sales, 
audios, images [2]. There has been numerous demand in the storage and processing of data in this twenties 
century. In order to support the process of this larger amount of requested data, cloud computing was 
developed and the implementation was used successfully in data storage and processing [3]. 

Big data is a concept which could be used in any industry and a vast amount of data could be used to 
ones personal profit, but the focus here is the Nigerian Electricity Distribution Companies (Discos), which 
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provides supply value chain services in electricity distribution. Discos are the known as the middle man 
between the customers and the electricity grid. Navigant Research reported that the smart meters installed 
worldwide will surpass 1.1 billion estimated by the year 2022 [4]. 

The analytics of big data is the process of extracting and analyze useful information in order to 
uncover the hidden patterns of data through the use of advanced techniques mostly data mining and statistical 
method to fetch hidden patterns [5]. Big data has grown too large and this makes it difficult to work with 
using traditional database management system [6], [7]. Business analytics is one of the new technologies that 
is characterized with low risks and quick paybacks which can help the organizations to improves and 
understand their business and leverage opportunities presented by abundant data domain specific 
analytics [8]. When big data are empty, they lacking value or they are being useless. The prospective quality 
of big data is to act effectively for decision making, in order to activate this decision making process, the 
organizations will need an efficient method to turn high volumes of diverse data into meaningful perceptivity. 
There was widespread awareness of the value of enhancing building energy efficiency to save energy and 
improve building sustainability. One successful way to achieve this aim is to discover and derive useful 
information from the construction of operational data. 

As the technologies keep increasing, smart grids distribution also increases which serve as the 
moving forces for the approval of big data analytics. These new technologies undergo development which 
made better quality to capture electricity used by the customers at any time of the day from the meter [9]. 
There are still campaigns in many countries about the adoption of smart grids technologies. The European 
Union member states have fully initiated the installation of the intelligent meters, as a way to improve the 
efficiency of the energy system and their target is to reach 80 percent roll out by the year 2020. The 
development of the smart grid and meter infrastructures intelligent system gave birth to new level of data for 
electric power information technology and business leaders. This makes big data analytics extend to many 
areas of technology and business intelligence [8]. 

Data Analysing using smart grid makes it possible to identify clusters that have excessive electrical 
load, clusters that have high power outage frequencies lines with high failure probability. As a result, it is 
possible for example to identify grid upgrades, transformations and maintenance and to effectively forecast 
energy management [10], [11]. An ideal power grid continually balances power generation and consumption 
for grid stability. The traditional power grid is based on one-directional approach from concentrated electrical 
power generation to the grid distribution by the transmission line, which does not allow the reverse flow. 
Currently, the boom of renewable energy, local energy production (from the original consumer called a 
“pro-consumer”), electrical mobility with rechargeable battery systems, energy storage and many other 
applications have forced the decentralization of the power system. In detail, the power grid system is 
evolving from a concentrated power plant to a micro one, where each local consumer may become a producer 
by, for example, photovoltaic panels, installed at home, storage, and fuel cell application. In this new 
scenario, the electronic meter is the gateway that connects the consumer/pro-consumer to the power grid. It is 
a key device for big data: data are frequently and rapidly acquired and subsequently analyzed by the power 
utilities [11], [12]. 

Since 2000, the Enel Company has replaced 38 million analogic meters with electronic meters, 
becoming the first power utility in the world ready for the smart grid application. The increasing number of 
distributed power stations, such as wind farms, concentrated heat power units, mini hydro plants and 
photovoltaic systems creates a virtual power plant (VPP). This new system can replace a conventional power 
plant, while providing more flexibility and higher efficiency. However, VPP is a complex system that 
requires difficult control optimization and secure communication. Big data analysis may help resolve 
problems and increase the reliability of the system. 

Currently, the wind and solar energy resources can be connected to the power grids, since there is 
closely relationship between the capacities of power generation of new energy resources and the feature 
randomness intermittency of climate conditions. The intermittent renewable new energy sources can be 
efficiently managed if only the big data of power grids in effectively analysed. This will help the new energy 
resources generated to be allocated to the region with shortage of electricity [13]-[15]. 

Several authors have worked on both daily and seasonal trends in energy consumption, but due to 
the geographical region, most of their studies focused on consumption patterns within a specific country. 
These authors [16], worked on electricity demand in India which based on aggregate macro data at both 
national and state level. Their work uses econometric analysis to determine the income in consumption of 
electricity and the relationship between the consumption and gross domestic product per capital and price of 
electricity over a given period of time. 

Yigzaw and Yohanis [17] presented on residential houses billing system in Hong Kong and the 
United Kingdom respectively. Consumption of energy patterns were considered as basic units of analysis. 
In [18] proposed MapReduces and apache Hadoop big data techniques was used by this author to analyze and 
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generate insights for data of energy collected in order to improve the energy efficiency. He also used the 
energy data collected to evaluate different prediction models and forecast future consumption on the basis of 
previous energy consumptions. 

Whaley and Saman [19] uses smart meters as means to gather various household daily energy 
consumptions. The household energy consumption was done through segmentation of electric appliances and 
new technologies for heat generation. This makes it easy to identify consumers with similar needs and 
behaviors. The major key role in the evolution of smart grids is the application of big data analysis in order to 
assist in extraction as well as to analyze only the efficient energy consumption from the smart grids so there 
will be efficient management and discovering of hidden energy consumptions [20]-[23]. 


2. RESEARCH METHOD 

The material used in this research is the data of about 196,000 homes and businesses that are 
customers of Ikeja Electric, one prepaid meter per entity. The smart meter generated information every day 
for a month, and there is one year of data available. All data is available in Excel and was later converted to 
CSV format. This primary data was obtained via the monitoring department of IKEDC’s database and also 
secondary data via the information gathered from recent whitepapers, research materials, and prepared texts 
about big data analytics on the cloud and Nigerian electric power distribution. Hive as a data warehousing 
tool and Hadoop file system (HDFS) can ingest various structured, unstructured and semi-structured datasets, 
but for the purpose of this study and due to its peculiarities viz-a-via data collection device (smart meter). 


Table 1. Data fields description 


S/N Attribute-Name Description 

1 Feeder a feeder line is part of an electric distribution network, 
usually a radial circuit of intermediate voltage. 

2 Undertaking Office (UT) A field office of Ikeja Electric for distribution of 
electric power within a Business unit. 

3 Business Unit (BU) The various business divisions of the distribution 
company. 

4 Tariff Plan The tariff plan a customer belong to. 

5 Type The type metering device, ie AMR or AMI. 

6 Customer numbers The unique identifier for each consumer. 

7 Monthly Consumption (October Electric power consumption for each month in 

2015 through October 2016) kilowatts per hour. 

8 Account Numbers Customers account number for recharging their 
prepaid meters 

9 Address Different households or offices with the installed 
prepaid meter. 

10 Average Average power consumption for each month. 


The apache Hive information warehouse software encourages questioning and overseeing 
substantial datasets link in the case of distributed computing Hive is a valuable asset for ELT, Hadoop 
knowledge inventory control, and Hadoop repository. Conversely, as opposed to conventional repositories, it 
is comparatively sluggish. It does not have any of the structured query language (SQL) functionality or even 
any of the database functions that standard repositories do. However, it embraces SQL, functions as a 
repository, and provides more users with access to Hadoop technologies (even those who are not 
programmers). It provides a method for converting unorganized and semi-organized data into functional 
template data. If you want to create a master data processing system? Hive allows you to do this. Do you 
want to develop a data storage facility? You can do the same thing with Hive, but you'll need to learn the 
techniques to make Hive an effective weapon for ELT tool [24-27]. As shown in Figure 1, the transformation 
process basically involves putting the datasets in the most appropriate format suitable for analysis. This 
entails the creation of a database, temporary table, with which the data is transferred from HDFS to Hive 
metastore, Thereafter, creating an appropriate schema to represent the needed fields and records most suitable 
for querying. 
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Figure 1. Phases involved in generating results of analysis 


3. RESULTS AND ANALYSIS 

The Hive tool used in carrying out this research 1s open-source apache-Hive licensed data 
warehousing frameworks which can either be configured or pre-configured. A pre-configured machine was 
used in this study, owned by Hortonworks. It’s the vendor’s Hadoop distribution. A Hortonworks Data 
Platform hosted on Microsoft azure platform was used for this analysis due to its low system requirements, 
cost, and ease of usage. The language used in this analysis is the HiveQL which is a querying language used 
in the Hive data warehousing environment, it’s similar to the SQL and MySQL languages. Figure 2 displays 
a table for running queries on data. 


CREATE TABLE IF NOT EXISTS smart_meter ( 


Customer_no STRING COMMENT ‘Customers meter number’, 


Databases 
ee Tariff_Plan STRING COMMENT ‘Individual tariff plans ', 
Sanalytics 
Fdefault Address STRING COMMENT ‘Home address’, 

Easmartmeter UT STRING COMMENT ‘Nearest Undertaken office’, 


mnn Rn 7 DT STRING COMMENT ‘Location of distribution Transformer’, 
EBtemp smartmeter 


Feeder STRING COMMENT ‘Electricity Lines’, 


=foodmart : ‘ PPRP 
BU STRING COMMENT ‘Business Units’, 


=: 
Sxacemo 


Type STRING COMMENT ‘AMR OR AMI meter’, 


Yv 


Oct_2015 INT COMMENT ‘Total power consumption for the month', 
Nov_2015 INT, 
Dec_2015 INT, 
Jan_2016 INT, 
Jan_2016 INT, 
Feb 2016 INT, 


Mar_2616 
Apr_2016 
May_2016 
Jun_2016 
Jul_2016 
Aug 2616 


INT, 
INT, 
INT, 
INT, 
INT, 
INT, 


Sept_2016 INT, 
Oct_2016 INT, 
Average INT 

) 


Figure 2. Creating a table for running queries on data 


This is the view of the pre-process panel after the dataset has been imported. Once the data is 
loaded, Hive recognizes attributes that are shown in the ‘attributes box’ at the left corner of pre-process panel 
which shows the list of recognized attributes. The ‘Table Name’ window above ‘attribute box’ displays the 
fields name and type. By clicking on the rightmost icon on the table name, the data present on the table are 
displayed in the result section. Also, it displays the minimum, maximum, mean and standard deviation of the 
selected attribute. The most significant power distribution equipment is the distribution transformers and 
feeders. This analysis seeks to explore the busyness or otherwise of this equipment for effective decision 
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making. Figure 3 shows the view in which the datasets are been loaded to their permanent location by 


matching column values of the datasets loaded into the temporary table. 


Fanalytics 
FSdefault 
EBtemp smartmeter 


=foodmart 


Sxademo 


4 


v 


Worksheet * x% 


1 insert overwrite table prepaid_meter 


2| SELECT 


3 regexp_extract(col_value, ‘*(?:([%, 

4 regexp extract(col tae 

5 regexp_extract(col_value, '^(?:([^, 

6 regexp _extract(col_value, ‘'*(?:([%, 

7 regexp _extract(col_ value, ‘*(?:([%, 

8 regexp_extract(col_value, ‘*(?:([%*, 

9 regexp _extract(col_ value, ‘*(?:([%*, 

10 regexp _extract(col_ value, ‘*(?:([%*, 
11 regexp_extract(col_value, ‘'*(?:([ 
12 regexp _extract(col_value, ‘*(?:([ 
13 regexp extract(col_value, ‘*(?:([ 
14 regexp_extract(col_value, ‘*(?:([ 
15 regexp extract(col_value, ‘*(?:([ 
16 regexp_extract(col ([ 
17 regexp _extract(col_ value, ‘*(?:([ 
18  regexp_extract(col_value, ‘*(?:([ 
i9 regexp _extract(col_value, ‘'*(?:([ 
regexp extract(col_value, ‘*(?:([ 


Stop execution 


- Distribution transformers and feeders 

The analysis depicted in the graphs below, clearly shows the transformers that have load more and 
less than the average (84K W/H) for a month, this clearly shows the distribution transformer that needs load 
reduction in the different localities. The Figure 4 shows the average monthly load of exceeding transformer 
while Figure 5 shows the below transformer average monthly load. The Figure 6 shows the feeders or electric 
lines that carry optimum loads, overloaded and perhaps needs prompt replacement or more lines to be added 


to the locality. 
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Figure 4. Transformers exceeding the 


average monthly load 
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Figure 3. Putting extracted data to the table 
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Figure 5. Transformers below the 
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Figure 6. Feeders exceeding the average monthly load 


- Effects of climatic conditions on consumption 

Figure 7 shows rainfall statistics for the year 2016 in Lagos, it is used to highlight the months with 
the most and least rainfall. Thereafter a query was run for the average rainfall for those months and the 
results shown in Figure 8. Figure 9 shows the distribution of months and the corresponding sun intensity, 
while Figure 10 shows the average power consumption in these months. Figure 11 displayed the measure of 
the level of non-consumption of power up to the average quota and also the computed standard deviation of 
the consumption for the year. 
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Figure 7. Lagos rainfall statistics for 2016 [28] 
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Figure 8. Power consumption in months with high and low rainfall 
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Figure 9. Sunshine stats for Lagos in 2015/2016 [28] 
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Figure 10. Sunshine stats for Lagos in 2015/2016 
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Figure 11. Standard deviation for consumption in a year 


- Degree of deviation from the average 
- Interpretation of results 

a. During the course of analysis, it was discovered that the total power consumed for the location under 
consideration is 7.6 GW (i.e. 7.6 billion watts), by about 196,000 consumers, this insight could be 
useful for load request and estimation from the transmission companies. The average consumption for 
a household in a month is 84K W/H, this value can be used to support cost analysis and projected 
revenue figures for a year. 

b. The equipment analysis provides knowledge in form of load analysis for the distribution equipment; it 
clearly shows there’re lots of electric lines with more than optimum electric traffic or throughput. 
Also, the graphs show the transformers that are typically overloaded (about 30% of them) clearly need 
their loads reduced. 

c. Analysis resulting from the effects of climatic conditions on power availability opposes the norm that 
there’s more power available or consumed during periods of high rainfall. The results show that 
there’s twice as much power consumed in January (period of low rainfall), compared to June (a 
typical month with high rainfall) sunshine. 

d. The standard deviation value (0.64) shows that 60% of households did not consume up to the average 
84kw/h which shows the level of non-availability of power across households in the 11 business 
divisions. 


4. CONCLUSION 

There are several other big data analytics methods of deriving insights from large data pools; Hive 
clearly provides an easy and convenient way of doing this. This study shows successful deployment of big 
data technologies in the cloud, which depicts the interoperation of two disruptive technologies (cloud meets 
big data). The Hive query language provide a convenient way of querying both structured and unstructured 
data, while offering programmers or analyst with prior SQL knowledge an easy way to process MapReduce 
jobs. Finally, this study has shown that there are enormous useful insights or business intelligence that can be 
gotten from inundated large datasets using Hadoop tools and most conveniently hosted in the cloud. This is 
so because of the open source nature of this technology and hence it is inexpensive to deploy. The only 
bottleneck is the expertise required to carry out such important task. A further research can be done in the 
area of AMI data analytics to get power consumption insights on an hourly basis, how big data analytics can 
be used to curb energy/electricity data theft, analytics for energy and Utilities management. 
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